Self-supervised representation learning follows a paradigm of withholding some part of the data and tasking the network to predict it from the remaining part. Towards this end, masking has emerged as a generic and powerful tool where content is withheld along the sequential dimension, e.g., spatial in images, temporal in audio, and syntactic in language. In this paper, we explore the orthogonal channel dimension for generic data augmentation. The data for each channel is quantized through a non-uniform quantizer, with the quantized value sampled randomly within randomly sampled quantization bins. From another perspective, quantization is analogous to channel-wise masking, as it removes the information within each bin, but preserves the information across bins. We apply the randomized quantization in conjunction with sequential augmentations on self-supervised contrastive models. This generic approach achieves results on par with modality-specific augmentation on vision tasks, and state-of-the-art results on 3D point clouds as well as on audio. We also demonstrate this method to be applicable for augmenting intermediate embeddings in a deep neural network on the comprehensive DABS benchmark which is comprised of various data modalities. Code is availabel at http://www.github.com/microsoft/random_quantize.
translated by 谷歌翻译
We propose a sparse end-to-end multi-person pose regression framework, termed QueryPose, which can directly predict multi-person keypoint sequences from the input image. The existing end-to-end methods rely on dense representations to preserve the spatial detail and structure for precise keypoint localization. However, the dense paradigm introduces complex and redundant post-processes during inference. In our framework, each human instance is encoded by several learnable spatial-aware part-level queries associated with an instance-level query. First, we propose the Spatial Part Embedding Generation Module (SPEGM) that considers the local spatial attention mechanism to generate several spatial-sensitive part embeddings, which contain spatial details and structural information for enhancing the part-level queries. Second, we introduce the Selective Iteration Module (SIM) to adaptively update the sparse part-level queries via the generated spatial-sensitive part embeddings stage-by-stage. Based on the two proposed modules, the part-level queries are able to fully encode the spatial details and structural information for precise keypoint regression. With the bipartite matching, QueryPose avoids the hand-designed post-processes and surpasses the existing dense end-to-end methods with 73.6 AP on MS COCO mini-val set and 72.7 AP on CrowdPose test set. Code is available at https://github.com/buptxyb666/QueryPose.
translated by 谷歌翻译
Multi-task learning (MTL) models have demonstrated impressive results in computer vision, natural language processing, and recommender systems. Even though many approaches have been proposed, how well these approaches balance different tasks on each parameter still remains unclear. In this paper, we propose to measure the task dominance degree of a parameter by the total updates of each task on this parameter. Specifically, we compute the total updates by the exponentially decaying Average of the squared Updates (AU) on a parameter from the corresponding task.Based on this novel metric, we observe that many parameters in existing MTL methods, especially those in the higher shared layers, are still dominated by one or several tasks. The dominance of AU is mainly due to the dominance of accumulative gradients from one or several tasks. Motivated by this, we propose a Task-wise Adaptive learning rate approach, AdaTask in short, to separate the \emph{accumulative gradients} and hence the learning rate of each task for each parameter in adaptive learning rate approaches (e.g., AdaGrad, RMSProp, and Adam). Comprehensive experiments on computer vision and recommender system MTL datasets demonstrate that AdaTask significantly improves the performance of dominated tasks, resulting SOTA average task-wise performance. Analysis on both synthetic and real-world datasets shows AdaTask balance parameters in every shared layer well.
translated by 谷歌翻译
Given a possibly false claim sentence, how can we automatically correct it with minimal editing? Existing methods either require a large number of pairs of false and corrected claims for supervised training or do not handle well errors spanning over multiple tokens within an utterance. In this paper, we propose VENCE, a novel method for factual error correction (FEC) with minimal edits. VENCE formulates the FEC problem as iterative sampling editing actions with respect to a target density function. We carefully design the target function with predicted truthfulness scores from an offline trained fact verification model. VENCE samples the most probable editing positions based on back-calculated gradients of the truthfulness score concerning input tokens and the editing actions using a distantly-supervised language model (T5). Experiments on a public dataset show that VENCE improves the well-adopted SARI metric by 5.3 (or a relative improvement of 11.8%) over the previous best distantly-supervised methods.
translated by 谷歌翻译
Aspect-based sentiment analysis (ABSA) aims at extracting opinionated aspect terms in review texts and determining their sentiment polarities, which is widely studied in both academia and industry. As a fine-grained classification task, the annotation cost is extremely high. Domain adaptation is a popular solution to alleviate the data deficiency issue in new domains by transferring common knowledge across domains. Most cross-domain ABSA studies are based on structure correspondence learning (SCL), and use pivot features to construct auxiliary tasks for narrowing down the gap between domains. However, their pivot-based auxiliary tasks can only transfer knowledge of aspect terms but not sentiment, limiting the performance of existing models. In this work, we propose a novel Syntax-guided Domain Adaptation Model, named SDAM, for more effective cross-domain ABSA. SDAM exploits syntactic structure similarities for building pseudo training instances, during which aspect terms of target domain are explicitly related to sentiment polarities. Besides, we propose a syntax-based BERT mask language model for further capturing domain-invariant features. Finally, to alleviate the sentiment inconsistency issue in multi-gram aspect terms, we introduce a span-based joint aspect term and sentiment analysis module into the cross-domain End2End ABSA. Experiments on five benchmark datasets show that our model consistently outperforms the state-of-the-art baselines with respect to Micro-F1 metric for the cross-domain End2End ABSA task.
translated by 谷歌翻译
当今,分会一代成为在线视频的实用技术。本章断点使用户能够快速找到所需的零件并获得总结注释。但是,没有公共方法和数据集用于此任务。为了促进该方向的研究,我们介绍了一个名为Chapter-gen的新数据集,该数据集由大约10K用户生成的视频和带注释的章节信息组成。我们的数据收集过程是快速,可扩展的,不需要任何其他手动注释。在此数据集之外,我们设计了一个有效的基线,专门针对视频章节生成任务。捕获视频的两个方面,包括视觉动态和叙述文本。它分别将本地和全球视频功能分别用于本地化和标题生成。为了有效地解析长时间的视频,Skip滑动窗口机构旨在定位潜在的章节。并且开发了交叉注意的多模式融合模块,以汇总标题生成的本地功能。我们的实验表明,所提出的框架比现有方法取得了优越的结果,这表明即使在微调后也无法直接传输类似任务的方法设计。代码和数据集可在https://github.com/czt117/mvcg上找到。
translated by 谷歌翻译
作为最成功的AI驱动应用程序之一,推荐系统的目的是通过在我们生活的许多方面提供个性化建议,以有效而有效的方式帮助人们做出适当的决定,尤其是针对各种面向人类的在线服务,例如E-商务平台和社交媒体网站。在过去的几十年中,推荐系统的快速发展通过创造经济价值,节省时间和精力以及促进社会利益,从而使人类受益匪浅。但是,最近的研究发现,数据驱动的推荐系统可能会对用户和社会构成严重威胁,例如传播虚假新闻以操纵社交媒体网站中的公众舆论,扩大不公平为代表性不足的团体或在工作匹配服务中的个人,或从建议结果中推断隐私信息。因此,系统的可信赖性一直吸引着各个方面的关注,以减轻推荐系统引起的负面影响,以增强公众对推荐系统技术的信任。在这项调查中,我们提供了可信赖的推荐系统(TREC)的全面概述,特别关注六个最重要的方面;即安全与鲁棒性,非歧视与公平,解释性,隐私,环境福祉以及问责制和可审计性。对于每个方面,我们总结了最近的相关技术,并讨论了潜在的研究方向,以帮助未来实现值得信赖的推荐系统。
translated by 谷歌翻译
本文提出了Salenet-端到端卷积神经网络(CNN),用于使用前额叶脑电图(EEG)进行持续注意水平评估。提出了一种偏置驱动的修剪方法,以及小组卷积,全局平均池(GAP),接近零的修剪,重量聚类和模型压缩的量化,达到183.11x的总压缩比。在这项工作中,压缩的分配器在记录的6个受试者EEG数据库上获得了最新的主题无关的持续注意力分类精度为84.2%。该沙发在ARTIX-7 FPGA上实施,竞争功耗为0.11 W,能源效率为8.19 GOPS/W。
translated by 谷歌翻译
已经在生物大脑的皮层中观察到了侧向抑制连接,并且已经在其在认知功能中的作用进行了广泛的研究。但是,在深度学习中的香草版本反向传播中,所有梯度(可以理解为信号和噪声梯度)在重量更新过程中流过网络。这可能导致过度拟合。在这项工作中,受到生物横向抑制的启发,我们提出了梯度面膜,该面膜在反向传播过程中有效地滤除了噪声梯度。这使学习的功能信息可以更强烈地存储在网络中,同时滤除嘈杂或不重要的功能。此外,我们在分析上证明了人工神经网络中的横向抑制如何提高传播梯度的质量。提出了一个新的梯度质量标准,该标准可以用作训练各种卷积神经网络(CNN)的措施。最后,我们进行了几个不同的实验,以研究梯度掩模如何定量和定性地改善网络的性能。定量地,原始CNN体系结构的准确性,修剪后的准确性以及对抗攻击后的准确性已显示出改善。从定性上讲,使用梯度掩模训练的CNN开发了显着图,主要集中在感兴趣的对象上,这对于数据增强和网络解释性很有用。
translated by 谷歌翻译
联合学习(FL)在中央服务器的帮助下支持多个客户的全球机器学习模型的分布式培训。每个客户端持有的本地数据集从未在FL中交换,因此保护本地数据集隐私受到保护。尽管FL越来越流行,但不同客户的数据异质性导致客户模型漂移问题,并导致模型性能降级和模型公平不佳。为了解决这个问题,我们在本文中使用全球本地知识融合(FEDKF)计划设计联合学习。 FEDKF中的关键思想是让服务器返回每个训练回合中的全局知识,以与本地知识融合,以便可以将本地模型正规化为全球最佳选择。因此,可以缓解客户模型漂移问题。在FEDKF中,我们首先提出了支持精确的全球知识表示形式的主动模型聚合技术。然后,我们提出了一种无数据的知识蒸馏(KD)方法,以促进KD从全局模型到本地模型,而本地模型仍然可以同时学习本地知识(嵌入本地数据集中),从而实现了全局 - 本地知识融合过程。理论分析和密集实验表明,FEDKF同时实现高模型性能,高公平性和隐私性。纸质审查后,项目源代码将在GitHub上发布。
translated by 谷歌翻译